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Genome  annotation  and  transcriptomics  of  oil-producing  algae 
FA9550-1 0-1-0095 

Reporting  period:  04/01/2010-12/31/2014 

Abstract 

Most  algae  accumulate  triacylglycerols  (TAGs)  when  they  are  starved  for  essential  nutrients  like 
N,  S,  P  (or  Si  in  the  case  of  some  diatoms).  We  had  proposed  to  use  whole  transcriptome 
analyses  to  detail  the  changes  in  gene  expression  that  occur  during  N-starvation  induced  TAG 
accumulation  in  Chlamydomonas.  We  used  RNA-Seq  on  the  lllumina  platform  for  quantitative 
determination  of  the  Chlamydomonas  transcriptome.  Deep  coverage  over  multiple  time  points  in 
nearly  a  dozen  different  experiments  in  at  least  3  strains  allowed  us  to  distinguish  early 
responses  to  the  stress  signal  and  to  unequivocally  identify  relevant  changes  in  RNA 
abundance.  This  led  to  the  functional  identification  of  8  acyltransferases,  several  of  which  were 
documented  to  have  a  role  in  TAG  accumulation  in  stressed  algal  cells,  and  candidate 
regulators  of  the  N  starvation  response.  We  used  our  RNA-Seq  data  analysis  pipeline  for 
building  and  quantitating  transcripts  from  Chlorella  and  Cyclotella  in  TAG-producing  conditions 
and  we  assembled  a  draft  genome  for  Cyclotella.  During  a  period  of  project  extension,  we 
generated  new  strand-specific  RNA-Seq  data  from  ribosomal  RNA-depleted  preparations  of 
RNA  for  the  purpose  of  identifying  regulatory  IncRNAs.  Some  of  these  candidates,  which  map  to 
intergenic  regions,  show  sharp  transient  patterns  of  expression  during  the  cell  cycle  consistent 
with  a  regulatory  role. 

Key  Accomplishments 

1.  The  analysis  of  the  Chlamydomonas  transcriptome  in  N-starved  cells  in  3  time  course 
experiments  is  complete.  We  have  curated  enzymes  involved  in  TAG  metabolism.  Besides  the  5 
previously  described  DGTT1-DGTT5  genes  (encoding  type  2  diacylglycerol  acyltransferases), 
we  identified  two  additional  genes,  DGAT1  and  DGAT3,  encoding  distinct  candidate  enzymes. 
DGAT1  had  eluded  prior  discovery  because  of  incomplete  sequence  coverage  in  that  region  of 
the  genome.  By  using  new  sequence  data  and  manual  assembly,  we  have  increased  the 
sequence  coverage  of  DGAT1  from  ~  50%  to  ~  90%.  DGTT1,  DGAT1  and  DGAT3  are 
coordinately  expressed  in  N-starved  Chlamydomonas  cells.  The  genes  are  also  up-regulated  in 
other  stress  situations  that  promote  TAG  accumulation,  consistent  with  a  causal  connection.  We 
developed  an  in  vivo  method  for  testing  the  function  of  algal  genes  in  TAG  synthesis. 

DGAT3,  conserved  in  the  plant  lineage,  does  not  support  TAG  synthesis  in  this  assay,  and  we 
wonder  whether  it  may  have  a  regulatory  role. 

2.  We  have  developed  methodology  for  and  now  have  considerable  experience  with  the 
analysis  of  lllumina  sequence  data  for  transcriptome  studies,  and  also  for  genome  and  transcript 
assembly.  These  methods  have  been  applied  to  transcriptome  data  from  Chlorella  (sub-contract 
to  Sayre)  and  Cyclotella  (sub-contract  to  Hildebrand).  In  both  cases  the  reads  were  used  to 
assemble  better  transcript  models  and  estimate  transcript  abundance  under  specific  situations. 

3.  We  also  expanded  the  scope  of  the  project  (without  increases  in  cost)  to  include  genome 
sequencing  of  Cyclotella  cryptica  in  collaboration  with  the  Hildebrand  group.  We  generated 
libraries  from  DNA  and  RNA  and  performed  de  novo  assembly  to  generate  a  first  draft  of  the 
genome  and  transcriptome  of  Cyclotella  cryptica.  The  first  draft  of  the  genome  has  an  N50  (i.e. 
typical  fragment  size)  of  lOkb  and  a  total  genome  size  of  approximately  160Mb.  The  total 
genome  estimate  is  in  agreement  with  that  generated  previously  based  on  calorimetric 
measurements.  The  transcript-based  gene  models  include  alternative  forms  generated  by 
splicing  or  from  different  start  sites.  Gene  models  were  constructed  by  using  MAKER,  which 
combines  multiple  gene  prediction  tool  including  Augustus  and  Fgenesh.  We  found  that 
Augustus  models  that  overlapped  Maker  models  were  the  most  reliable,  and  allowed  us  to 
generate  a  high  confidence  set  of  approximately  10,000  genes.  We  also  generated  bisulfite  data 
to  produce  a  map  of  cytosine  methylation,  and  found  that  this  mark  is  associated  with  repeat 
regions  of  the  genome.  The  manuscript  describing  these  results  is  in  preparation. 


4.  We  generated  new  strand-specific  RNA-Seq  data  from  rRNA-depleted  samples  with  parallel 
ChIP-Seq  data  from  18  samples  collected  during  the  Chlamydomonas  cell  cycle.  We  used  these 
reads  to  assemble  nearly  900  new  transcripts  that  appear  to  be  long  non-coding  RNAs 
(IncRNAs).  These  IncRNAs  map  to  intergenic  regions  and  also  to  the  opposite  strand  of  protein¬ 
coding  mRNAs.  In  individual  cases  that  were  manually  curated,  the  RNAs  do  not  appear  to  code 
for  long  proteins  and  their  pattern  of  expression  is  transient,  suggesting  that  they  may  be 
regulatory.  Manual  curation  also  indicates  that  the  5’  ends  have  chromatin  marks  that  are  typical 
for  Polll  transcripts,  consistent  with  these  being  IncRNAs.  We  are  very  interested  in  continuing 
this  work  should  funding  permit. 

Background 

Most  algae  accumulate  triacylglycerols  (TAGs)  when  they  are  starved  for  essential  nutrients  like 
N,  S,  P  (or  Si  in  the  case  of  some  diatoms).  In  the  absence  of  such  essential  nutrients  they  are 
unable  to  synthesize  macromolecules  that  require  these  elements.  Therefore,  they  cannot  grow; 
rather,  they  divert  carbon  towards  storage  molecules  -  either  starch  or  neutral  lipids  like  TAGs. 
The  TAGs  are  precursors  for  biodiesel  because  the  fatty  acid  constituents  of  TAG  can  be 
transesterified  to  generate  methyl  esters. 

We  had  proposed  to  use  transcriptome  approaches  to  detail  the  changes  in  gene  expression 
that  occur  during  N-starvation  induced  TAG  accumulation  in  Chlamydomonas.  Chlamydomonas 
is  a  key  reference  organism  for  the  chlorophyte  algae  because  there  is  a  draft  genome  with  over 
17,000  gene  models  (many  of  them  manually  curated  and  functionally  annotated)  and  there  are 
resources  for  classical  and  reverse  genetics. 

We  had  also  proposed  to  undertake  RNA  Seq  analysis  of  Chlorella  (for  another  grantee  Richard 
Sayre  at  the  Donald  Danforth  Plant  Science  Center  at  the  time  of  the  award)  and  to  assemble  a 
transcriptome  and  genome  for  a  diatom  Cyclotella  cryptica  (for  another  grantee  Mark  Hildebrand 
at  Scripps  /  UCSD).  Chlorella  was  of  interest  because  it  grows  to  high  density  and  the  diatom 
had  been  noted  previously  to  be  a  high  producer  of  TAGs.  Cyclotella  is  a  promising  alga  for 
biofuel  production,  but  to  date  virtually  nothing  is  know  about  its  genome  and  genes. 

Approach 

We  used  RNA-Seq  on  the  lllumina  platform  for  quantitative  determination  of  the 
Chlamydomonas  transcriptome.  Deep  coverage  over  multiple  time  points  in  nearly  a  dozen 
different  experiments  in  at  least  3  strains  allowed  us  to  distinguish  early  responses  to  the  stress 
signal  and  to  unequivocally  identify  relevant  changes  in  RNA  abundance.  Gene  models  were 
corrected  based  on  RNA  Seq  coverage,  which  allowed  the  proteins  to  be  correctly  expressed  in 
heterologous  systems  to  validate  the  activity  of  the  gene  product.  We  also  used  a  reverse- 
genetic  strategy  as  a  form  of  validation  of  the  role  of  individual  genes  in  the  TAG  accumulation 
pathway.  The  success  on  this  project  prompted  us  to  use  the  RNA-Seq  coverage  to  annotate 
new  (previously-unobserved)  transcripts  from  the  Chlamydomonas  genome,  many  of  which 
correspond  to  long  non-coding  RNAs  (IncRNAs). 

Significance 

The  fact  that  multiple  distinct  acyltransferase  enzymes  are  upregulated  during  N  and  other 
stress  starvation  in  Chlamydomonas  indicates  that  there  may  be  several  sites  /  foci  of  TAG 
synthesis. 

The  transient  pattern  of  expression  of  IncRNAs  suggests  that  they  may  be  regulatory.  Hence  we 
are  interested  in  pursuing  this  work  to  understand  how  IncRNAs  impact  the  metabolic  program, 
especially  during  the  transition  from  carbon  fixing  to  carbon  utilizing  stage  of  the  cell  cycle. 

Collaborators 

Christoph  Benning  (Michigan  State  University)  for  chemical  analysis  of  fatty  acids  in  TAG  and 
TAG  quantitation 

Arthur  Grossman  (Carnegie  Institution)  for  screening  for  loss  of  function  mutations  in  various 
genes. 


Mark  Hildebrand  (UCSD/SIO)  for  biology  of  Cyclotella 

Key  findings 

1.  RNA-seq  coverage-based  annotation  of  gene  models  generates  more  reliable  and  functional 
gene  models. 

2.  We  identified  several  TAG  synthesis  enzymes  that  had  not  previously  been  described  in 
Chlamydomonas  and  documented  their  activities  by  functional  complementation  in  yeast. 

3.  We  showed  that  TAG  accumulation  occurred  in  several  different  stress  situations  that  impact 
cell  growth  and  division,  with  evidence  for  increased  expression  of  the  diacylglycerol 
acyltransferase-encoding  genes  in  each  case. 
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